Introduction to geospatial mapping in R

Author

Paul Magwene

Libraries

Install the sf and ggspatial packages via R package manager.

library(tidyverse)
library(patchwork)

library(sf)
library(ggspatial) # nice extras

Data: Shapefile

We’re going to get data on country, state, and county boundaries from a website called “Natural Earth”](https://www.naturalearthdata.com/). Natural Earth is a public domain map data set available at 3 different scales. The data is free to use and the development of Natural Earth is supported by the North American Cartographic Information Society.

The downloads page on Natural Earth provides links to different types of features (cultural (political), physical, raster) at 3 different scales. We’ll use the large scale cultural data (1:10m).

Download and de-compress the zip files for:

Place the directories extracted from the zip files your working directory before proceeding. You should have the following three directories in your working directory:

  1. ne_10m_admin_0_countries
  2. ne_10m_admin_1_states_provinces
  3. ne_10m_admin_2_counties

Reading shape files

In the sf package, all function that operate on spatial data are prefixed by st_ (referring to to spatial type).

The function to read a simple features file (or other common GIS data files) is sf::st_read. Let’s start by reading the Natural Earth countries file:

countries <-
    st_read("ne_10m_admin_0_countries/ne_10m_admin_0_countries.shp")
Reading layer `ne_10m_admin_0_countries' from data source 
  `/Users/pmagwene/gits/Bio724D_2024_2025/class_notes/13_slides_mapplotting/ne_10m_admin_0_countries/ne_10m_admin_0_countries.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 258 features and 168 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -180 ymin: -90 xmax: 180 ymax: 83.6341
Geodetic CRS:  WGS 84

SF objects are data frames

typeof(countries)
[1] "list"
class(countries)
[1] "sf"         "data.frame"
dim(countries)
[1] 258 169

Where’s the geometry?

Look at the countries object in your data viewer. You’ll see lots of variables (columns) with some obvious and non-obvious meanings. The very last column is called “geometry”, and the values of this column are lists that represent the geometric objects of the mapping (country boundaries in this case)

Querying simple features objects about the CRS

We can ask a simple features object about it’s coordinate reference system:

st_crs(countries)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]

Mapping a simple features data frame

Mapping a data frame representing simple features data in ggplot is easy using the geom_sf object.

The default coordinate system used is WGS84 with a Mercator projection:

ggplot() +
  geom_sf(data = countries) 

Creating an SF object on the fly

Let’s create a SF point object representing where Durham is on the map.

# create a data frame with the latitude and longitude of durham
durham <- tibble(long = -78.9, lat = 36) 
durham
# A tibble: 1 × 2
   long   lat
  <dbl> <dbl>
1 -78.9    36

Now convert that data frame to an sf object using st_as_sf, and specify the coordinate reference system:

durham_sf <- st_as_sf(durham, 
                      coords = c("long","lat"),
                      crs=4326)

durham_sf
Simple feature collection with 1 feature and 0 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -78.9 ymin: 36 xmax: -78.9 ymax: 36
Geodetic CRS:  WGS 84
# A tibble: 1 × 1
     geometry
* <POINT [°]>
1  (-78.9 36)

To plot this point data we can simply include it as another geom_sf layer:

ggplot() +
  geom_sf(data = countries) +
  geom_sf(data = durham_sf, color = "red")

Mapping with a different CRS

If we want to use a different coordinate system we can specify that using the coord_sf function. Here we use EPS:3035 which uses the Lambert Azimuthal Equal Area projection.

Note that because we specified our point data (location of Durham) as an sf object the CRS transform also applies to that data:

ggplot(data = countries) +
    geom_sf() +
    geom_sf(data = durham_sf, color="red") +
    coord_sf(crs = st_crs("EPSG:3035")) 

Chloropleth plots

A chloropleth plot is a map in which the areas or regions are colored with respect to a numerical variable. The Natural Earth country data files include information on population (POP_EST) so let’s plot this numerical variable on our map. We’ll also demonstrate how to zoom in on a particular map region (note xlim = longitude limits, ylim = latitude limits)

ggplot() +
  geom_sf(data = countries,
          aes(fill = POP_EST)) +
  coord_sf(xlim = c(-20, 97), ylim = c(60, -37), expand = FALSE) +
  scale_fill_viridis_c(option = "inferno", trans = "log10") +
  labs(fill = "Log10(Population)")
Warning in scale_fill_viridis_c(option = "inferno", trans = "log10"): log-10
transformation introduced infinite values.

Filtering an SF data frame

Let’s load the states/provinces SF file and demonstrate that we can filter the resulting data structures just like any other data frame:

states <-
  st_read("ne_10m_admin_1_states_provinces/ne_10m_admin_1_states_provinces.shp")
Reading layer `ne_10m_admin_1_states_provinces' from data source 
  `/Users/pmagwene/gits/Bio724D_2024_2025/class_notes/13_slides_mapplotting/ne_10m_admin_1_states_provinces/ne_10m_admin_1_states_provinces.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 4596 features and 121 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -180 ymin: -90 xmax: 180 ymax: 83.6341
Geodetic CRS:  WGS 84
ggplot() +
  geom_sf(data = states) +
  coord_sf()

Let’s filter the states data frame to just include those in the United States.

us_states <- 
  states |>
  filter(admin == "United States of America")
ggplot() +
  geom_sf(data = us_states) +
  coord_sf() 

That plot looks a little funny (can you figure out why?). Let’s try a different coordinate reference system:

ggplot() +
  geom_sf(data = us_states) +
  coord_sf(crs = st_crs("EPSG:5072")) 

Let’s further filter the US states to only include those in the continental US and then use patchwork to compare the Mercator projection to an area preserving Albers projection:

continental_us <- 
  us_states |>
  filter(!(name %in% c("Alaska","Hawaii")))

us1 <-
  ggplot() +
  geom_sf(data = continental_us) +
  coord_sf()
  
us2 <-
  ggplot() +
  geom_sf(data = continental_us) +
  coord_sf(crs = st_crs("EPSG:6350")) 

us1 + us2 # layout via patchwork

Zooming in further: US counties and North Carolia

Let’s get even further levels of detail by plotting US county boundaries.

As we did above, we’ll focus on the continental US. In this data, the REGION column is what we need to sort on. Besides Hawaii (HI) and Alaska (AK) we also need to filter out Puerto Rico (PR) and the Virgin Islands (VI).

us_counties <- 
  st_read("ne_10m_admin_2_counties/ne_10m_admin_2_counties.shp")
Reading layer `ne_10m_admin_2_counties' from data source 
  `/Users/pmagwene/gits/Bio724D_2024_2025/class_notes/13_slides_mapplotting/ne_10m_admin_2_counties/ne_10m_admin_2_counties.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 3224 features and 61 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -179.1435 ymin: 17.68277 xmax: 179.7809 ymax: 71.4125
Geodetic CRS:  WGS 84
continental_counties <-
  us_counties |>
  filter(!(REGION %in% c("AK","HI","PR", "VI")))
ggplot() +
  geom_sf(data = continental_counties)

Focusing on North Carolina

north_carolina <-
  us_counties |>
  filter(REGION == "NC")

ggplot() +
  geom_sf(data = north_carolina)

Plotting localilies on our NC map

As a final example, let’s plot the locations in North Carolina where there have been reported citings of rare and beautiful North Carolina Blue Snouters.

nc_snouters <- 
  read_tsv("https://raw.githubusercontent.com/Bio724D/Bio724D_2024_2025/refs/heads/main/data/nc_snouter_sites.tsv")
Rows: 24 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
dbl (2): latitude, longitude

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(nc_snouters)
# A tibble: 6 × 2
  latitude longitude
     <dbl>     <dbl>
1     36.3     -79.4
2     36.0     -79.0
3     36.0     -79.1
4     35.9     -79.0
5     35.9     -78.9
6     35.9     -78.8
snouters_sf <-
  st_as_sf(nc_snouters, 
           coords = c("longitude","latitude"),
           crs=4326)
ggplot() +
  geom_sf(data = north_carolina, fill = "antiquewhite") +
  geom_sf(data = snouters_sf, 
             color="firebrick",
             size = 2,
             shape = 9) +
  annotation_scale(location = "bl", width_hint = 0.4) +
  labs(title = "Locations of NC Blue Snouter citings",
       x = "Longitude", y = "Latitude") +
  theme_minimal()